AI Learns Parallel Parking - Deep Reinforcement Learning

Описание к видео AI Learns Parallel Parking - Deep Reinforcement Learning

Big thanks to Hostinger for sponsoring this video! Go to https://hostinger.com?REFERRALCODE=1SAMUEL08 and get 20% off your hosting plan.
I'm really glad how my new portfolio worked out! https://samuelarzt.com

Also check out the original parking video:    • AI Learns to Park - Deep Reinforcemen...  

Two AI fight for the same parking spot:    • Two AI Fight for the same Parking Spot  

Neural Networks explained in a Minute:    • Explained In A Minute: Neural Networks  

Subscribe for more content like this:
   / @samuelarzt  

Follow me on Twitter for more frequent updates on my projects:
  / samuelarzt  

Last time we trained an AI how to park (   • AI Learns to Park - Deep Reinforcemen...  ). A lot of people suggested in the comments of that video to try parallel parking next. So that's what this video is all about. We are using the same methods as last time and try different adjustments to the learning algorithm and environment in order to make the agent more generalizing and precise.

The simulation was implemented using Unity's ML-Agents framework (https://unity3d.com/machine-learning). The AI consists of a deep Neural Network with 3 hidden layers of 128 neurons each. It is trained with the Proximal Policy Optimization (PPO) algorithm, which is a Reinforcement Learning approach.

Basically, the input of the Neural Network are the readings of eight depth sensors, the car's current speed and position, as well as its relative position to the target. The outputs of the Neural Network are interpreted as engine force, braking force and turning force. These outputs can be seen at the top right corner of the zoomed out camera shots.

The AI starts off with random behaviour, i.e. the Neural Network is initialized with random weights. It then gradually learns to solve the task by reacting to environment feedback accordingly. The environment tells the AI whether it is doing good or bad with positive or negative reward signals.

The training was done on a computer with an i5 (7th or 8th gen) and a GTX 1070 with 100x simulation speed, using 6 instances of the environment and up to 6 processes running in parallel.

Music from Bensound.com:
Timelapse Music: "The Elevator Bossa Nova"
Outro: "All That"

Background Music 1: Drops of H2O ( The Filtered Water Treatment ) by J.Lang (c) copyright 2012 Licensed under a Creative Commons Attribution (3.0) license. http://dig.ccmixter.org/files/djlang5... Ft: Airtone

#ArtificialIntelligence #MachineLearning #ReinforcementLearning #AI #NeuralNetworks #hostinger #inspeedwebelieve #speedfreak

Комментарии

Информация по комментариям в разработке